8bitfiles.net/archives

home *** CD-ROM | disk | FTP | other *** search

/ 8bitfiles.net/archives / archives.tar / archives / compuserve-file-archive / 05 Programming / BFCSRC.TUT < prev next >

Wrap

Text File | 2019-04-13 | 36KB | 768 lines

*====================================================================* *Everything you wanted to know about Forth (but were afraid to ask). * * Copyright (c) 1986, by Scott Ballantyne. * *====================================================================* This file contains a description of how a threaded-code Forth compiler works, with specific reference to Blazin' Forth. You don't need to know the stuff in this file, unless you are interested in the particulars of how Forth compilers work, or are interested in improving or changing one. Specifically, I wrote the following as an aid to people who might be trying to understand the source to Blazin' Forth, and it should be considered part of the documentation for the source files to the system. To understand this document, you will need a decent knowledge of Forth. An understanding of pointers won't hurt any either. You don't need to know machine language to understand the stuff in this file, but you will, of course, need it to understand the actual source. I have attempted to provide sample code in hi-level forth that illustrates the routines involved. This code is similar, but not exactly the same, as the actual machine level routines in the actual compiler. In particular, you should not expect these routines to actually work if you type them into a forth system in an attempt to build a "Forth in Forth". They are provided to add clarity, and that is their only function. --------------------------------------------------------------------------- ------------------------ What is a Virtual Machine? ----------------------- --------------------------------------------------------------------------- A Virtual Machine is a creation, in software, of a piece of hardware. Note that this hardware does not actually have to exist - it is only within the last year that an actual hardware Forth computer has been built. Forth has been around a lot longer than that. All high level languages are essentially virtual machines, since they implement instructions which are not part of the actual hardware CPU. As an example, Forth uses two stacks, a parameter stack, and a return stack. On an actual hardware Forth computer, the built-in machine language of the computer would contain instructions for manipulating each stack, and the pointers to the bottom of each stack. On the 6502, there is actually only one stack - one or the other of the Forth stacks must be emulated using software routines. So you could say that one stack is a hardware stack, and the other stack is a Virtual stack. As another example, the function call mechanism (how the actual functions, procedures, or words are eventually caused to execute) of a high level language is rarely directly supported by hardware instructions. On the majority of CPU's in use on personal computers today, the only real function call mechanism implemented in the hardware is the subroutine call (usually referred to as a JSR or CALL instruction). This instruction usually only saves the return address automatically. Any other information or any other method of invoking a subroutine must be done by software that, essentially, is a software (or virtual) function call, as opposed to the hardware function call and return. This is particularly true of threaded code Forth, and the routine which implements this function call mechanism is called NEXT. Understanding NEXT is the key to understanding the functioning of the Forth compiler at its lowest level. -------------------------------------------------------------------------- -------------------------- Introducing NEXT. ----------------------------- -------------------------------------------------------------------------- To understand how NEXT (and the various Forth machine registers that NEXT uses to do its thing) works, let's first take a quick look at the structure of a compiled higher level Forth word. It is essentially a list of addresses: : EXAMPLE W1 W2 W3 ; ( Standard Forth Header goes here) Address of W1 Address of W2 Address of W3 Address of EXIT ( ; ) NEXT uses several auxiliary registers to keep track of where the user program is. On a CPU with many registers, these would be kept in a selected CPU register. On the 6502, which has only 4 user accessible registers, these are maintained as virtual registers (page zero locations are used for greater speed). One of these virtual registers is called the Interpreter Pointer, or IP for short, and it is responsible for keeping track of the progress of the current program. When NEXT is entered, in the course of running a program, the IP will be pointed at the word we want to execute. NEXT does some stuff (to be described in a moment) to cause the this word to begin to execute, but before transferring control to this word, it moves the IP ahead to the next words address, so it will know what word to run the next time it is called. Here is a sample execution of EXAMPLE, given above: IP --> Address of W1 ( NEXT executes W1, first moving IP ahead to W2) IP --> Address of W2 ( NEXT executes W2, first moving IP ahead to W3) IP --> Address of W3 ( NEXT executes W3, first moving IP ahead to EXIT) IP --> Address of EXIT I like to think of the IP as a kind of Address Slider, that can be moved ahead or behind to direct the flow of execution of the current program. Ultimately, of course, NEXT must cause machine language instructions to be executed, which essentially means changing the hardware program counter of the CPU to point to the appropriate batch of instructions to be executed. It does this using another virtual register called the Current Word Pointer, or W for short. To understand this portion of NEXT, we have to clear up exactly what we mean by "Address of Word" in the above discussion. The individual members of the list that makes up a Forth definitions executable body are the addresses of the code field in the header of the compiled word. (These "addresses of code fields" will be refered to as "the execution address" of a word in the rest of this document. When the term "code field" is used, the reference will be to the actual code field portion of a dictionary header. Or, at least, that is how I am going to try to use these terms.) This execution address (as you may recall) itself stores an address which points to machine level (assembly language) instructions. It is these instructions that NEXT causes the CPU to execute, by forcing the CPU's program counter to the address stored at the execution address of that word. So we have a couple of levels of indirection here: The IP points to a location which holds the execution address of a word. The execution address pointed to by the location pointed to by the IP points to executable machine language instructions. So the full story of NEXT is as follows: 1) NEXT retrieves the value stored at the address in the IP. 2) It saves this value ( the execution address of a word) in W (the current word pointer). 3) It then moves the IP ahead to point at the next word to be executed. 4) Finally, it forces the hardware program counter to the value stored at the address in W, which causes machine level instructions to execute. Here is an example of the full execution of a forth word. Let's make up some example addresses for our example execution: Address Contents Description ----------------------------------------------------------------------- $A000 [ $0600 ] W1's execution address is $A000, and contains $0600. $B000 [ $0700 ] W2's execution address is $B000, and contains $0700. $C000 [ $0800 ] W3's execution address is $C000, and contains $0800. $0900 [ $0880 ] EXIT's execution address is $0900, and contains $0880. And here is how the compiled EXAMPLE word from earlier looks - let's say that the body of example starts at $E000: Address Contents Desription ---------------------------------- $E000 [ $A000 ] Compiled W1 $E002 [ $B000 ] Compiled W2 $E004 [ $C000 ] Compiled W3 $E006 [ $0900 ] Compiled EXIT. So, at the entry to NEXT, the IP will contain $E000. NEXT fetches the address stored here, and stuffs it into W, so W will now contain $A000, which is the execution address of W1. NEXT now increments the IP to point at the next word, so the IP will contain $E002. Finally, NEXT forces the program counter of the CPU to the address stored in the address stored in W. So here's a quickie quiz - what will be the address in the hardware program counter? (Answer: $0600 - which is the address of the machine language code for W1). Here is a quick synopsis of the values stored in the IP, W, and hardware PC for the execution of EXAMPLE, given above. It might be a good idea to pause here, and try to run through the rest of the example on your own, to check your understanding (and the clarity of my explanation) of how NEXT functions. Word-to-Execute IP W IP-AT-EXIT PC ---------------------------------------------------------- W1 $E000 $A000 $E002 $0600 W2 $E002 $B000 $E004 $0700 W3 $E004 $C000 $E006 $0800 EXIT $E006 $0900 $E008 $0880 As a final aid to understanding, here is an implementation of NEXT in hi-level Forth: : NEXT IP // Get address of IP @ // Get value of IP (address of next word to execute) @ // Get that words execution address W ! // And stuff into the current word pointer. 2 IP +! // Move IP along to next word, for next time. W @ // Get the execution address from W. @ // Get the actual address of the code. PC ! // Force into hardware PC, so that it will execute. ; Now that you understand NEXT (I hope), and the role of the Forth registers IP and W, you are in a good position to understand the rest of the Forth system. --------------------------------------------------------------------------- ---------------- EXECUTE - or how Forth launches programs ----------------- --------------------------------------------------------------------------- You might be wondering at this point exactly how an application gets launched in the first place. Since NEXT uses the IP, and assumes that the IP is pointing at a compiled execution address, how do words that you just type in from the terminal get executed? Obviously, words typed directly to the interpreter from the terminal don't have an address which is valid for the IP. The answer is the Forth word EXECUTE, which takes an execution address as it's argument. When you type a word to the interpreter that it can find in the dictionary, it pushes the execution address of the word onto the parameter stack, and calls EXECUTE. Execute first saves this execution address in W, and then forces the PC to the address stored in this execution address, just like the last part of NEXT. Here is EXECUTE in high level forth: : EXECUTE ( execution-address --- ) W ! // save in W - then do last part of NEXT W @ @ // get the address of the code to execute PC ! // and execute it. ; Note that EXECUTE does not call NEXT - it assumes the EXECUTEd word will be doing that. At this point you are no doubt wondering how the IP gets initialized at all. It's not hard to understand, but let's put off a detailed discussion of it until we talk about the DOCOLON and EXIT routines, a little further on, but here is a brief hint: When EXECUTE executes your word, there is already a valid value in the IP - it is pointing somewhere inside INTERPRET. If the word you are executing is a colon definition, then the first thing it does is save the current value of the IP, and then changes it to point to itself. A CODE definition won't change the IP at all (unless the code you write is supposed to), and so the pointer to inside INTERPRET just hangs around until the code defintion gets to it's NEXT call, which causes the INTERPRET word to resume. This will all become clearer when you understand exactly how DOCOLON and EXIT work. Incidentally, there is also an EXECUTE inside of the compiler loop - it's there to handle IMMEDIATE definitions - the ones that execute even when you are compiling. The logic here is the same as above. The only difference is that the IP will be pointing somewhere inside ] , instead of inside INTERPRET. ---------------------------------------------------------------------------- ------------------------- How Forth Does Branching ------------------------- ---------------------------------------------------------------------------- In our discussion of NEXT, above, we only talked about sequential execution of words. What happens if we need to branch around words ( as we do in conditionals like IF) or cause the same words to be executed repeatedly ( as we do in DO LOOP or BEGIN UNTIL constructs)? The answer is actually very simple - we just change the IP to point to the word we want to branch to, and then execute NEXT. If you followed the above discussion on NEXT, it should be obvious that this causes a complete diversion of the flow of control for the current word. When a branch is compiled, two things are done: a special word that controls the branch is compiled, and the destination address of the branch is compiled. For example: : CR'S BEGIN CR AGAIN ; This word will just print newlines, until a rude action is taken by the operator to stop it. Here is how the compiled word looks in memory. (Standard Forth Header goes here) $A000 CR ( execution address of CR) $A002 BRANCH ( execution address of BRANCH) $A004 $A000 ( address to BRANCH to) $A006 EXIT ( execution address of EXIT) When CR'S executes, NEXT executes CR, and then it executes BRANCH. BRANCH takes the address immediately following it in memory, in this case $A000, and stuffs it into the IP. BRANCH then JMP's to NEXT. Since the IP is once again pointing at CR, (having been changed by BRANCH), NEXT once again executes CR, and then BRANCH, which causes the IP to be changed, and so on, forever. BRANCH is an example of an unconditional branching primitive - it always branches, no matter what. ?BRANCH is a conditional branching word - it will branch if the value on the top of the stack is FALSE - otherwise, no branch takes place. Here is an example of a word that would cause ?BRANCH to be compiled: : CR? ( BOOLEAN -- ) IF CR THEN ; CR? will obviously print a CR if the top of the stack is non-zero, otherwise, nothing happens. Here is how CR? would look in memory: (Standard Forth Header) $A000 ?BRANCH ( execution address of ?BRANCH) $A002 $A006 ( destination address of branch ) $A004 CR ( execution address of CR) $A006 EXIT ( execution address of EXIT) In this case, the execution would execute ?BRANCH first, which tests the value of the top of the stack. Notice that two things can happen here, BOTH of which will change the IP: 1) If the top of the stack is FALSE, ?BRANCH will force the IP beyond the branch address, by adding two. This will obviously cause CR to be executed. 2) If the top of the stack is TRUE, ?BRANCH will act exactly like BRANCH, and stuff $A006 (the word immediately following ?BRANCH) into the IP, which will obviously just EXIT the definition. In any branching word, one or the other of these two things will happen. All of the branching words are compiled in exactly this way, with the branching primitive first, and the destination address of the branch immediately following it in memory. The reason that there are more branching primitives in Blazin' Forth than just these two has more to do with entry and exit conditions that it does with the actual branching mechanism. For example, IF-THEN, IF-ELSE-THEN, BEGIN-UNTIL, BEGIN-WHILE-REPEAT, BEGIN-AGAIN are all implemented with combinations of ?BRANCH and BRANCH, since all of these involve boolean testing of the top of the stack. Things like DO-LOOP and ?DO-LOOP and DO-+LOOP, etc. have additional things to do, like move the loop parameters to the return stack, add or subtract the loop index, test the loop index, and clean up the return stack on the loop exit. But the actual mechanics of branching are exactly the same, only the entry/exit conditions differ from word to word. Among other advantages, it makes the compiler code much simpler, since there are fewer 'special cases' to check for. Once again, as an aid to understanding, here are sample implementations of BRANCH and ?BRANCH in hi-level forth. As you read these, keep in mind that when BRANCH or ?BRANCH is executing, the IP will be pointing at the branch address - since it gets incremented before the execution of the next word by NEXT: : BRANCH ( branch unconditionally STACK: -- ) IP @ // Get the value of IP, ordinarily the address of the code // field of the next word to execute. In this case, it is // a branch address. IP @ // Get the value stored at the address - which is the // destination branch value. IP ! // Change the IP to the destination address. NEXT // and execute. ; : ?BRANCH ( conditional branch STACK: BOOLEAN -- ) 0= IF // test top of stack - if FALSE ( equal to zero) BRANCH // just execute BRANCH ELSE // value was TRUE, don't branch. 2 IP +! // move IP over branch address, to next word. THEN NEXT // and execute. ; ---------------------------------------------------------------------------- ---------------- How Forth Does Nesting - DOCOLON and EXIT ----------------- ---------------------------------------------------------------------------- In the above examples there was never any question of remembering where we came from - the course of execution of the word was changed, and we never really cared to remember what called what. But what about having one colon definition calling another one? How does Forth remember where to come back to when it has finished the called definition? This is not particularly difficult either. Once again, the IP and W, the current word pointer, play central roles. In what follows, remember that W points to the actual address of the word we want to execute, while the IP points to a memory location which contains the address of the word. What happens is this: NEXT starts to execute a colon definition. All colon definitions have the same address stored at their execution address, which is the address of a machine language routine called DOCOLON or NEST. It is this routine that is responsible for saving the current execution environment. DOCOLON first pushes the current value of the IP (which holds the address of the word we want to return to) onto the return stack. At this point, W will be holding the execution address of the new word to execute. We want to execute the body of this word, so DOCOLON now adds two to the value in W, which makes it point to the BODY of this definition, and stuffs it into the IP. DOCOLON now calls NEXT, which causes the new word to execute. Eventually, NEXT will execute EXIT, which is the word compiled by ; . EXIT's job is to restore the previous execution environment, and it does this by very simply by pulling the top of the return stack, and stuffing it into the IP. It then calls NEXT, which causes the calling word to resume execution as though nothing had happened. Here is an example: : FOOBAR CR ; : COLON-CALL FOOBAR ; Compiled view of the above: (Header for FOOBAR) $A000 DOCOLON ( Code field portion of header) $A002 CR ( Body ) $A004 EXIT (Header for COLON-CALL) $B000 DOCOLON ( Code field portion of header) $B002 FOOBAR ( Body) $B004 EXIT And here is a simplified execution of COLON-CALL. Word-to-Execute IP W IP-AT-EXIT RETURN-STACK ------------------------------------------------------------------------- FOOBAR $B002 $A000 $B004 XXXXX DOCOLON $B004 $A000 $A002 $B004 CR $A002 CR's EA $A004 $B004 EXIT $A004 EXIT EA $B004 XXXXX EXIT (in CALL-COLON) $B004 EXIT EA ----- ------ (NOTE: EA stands for Execution Address.) Once again, here is a sample implementation in higher level forth, of DOCOLON and EXIT: : DOCOLON IP @ // get current value of IP >R // Save it on return stack W @ // Get execution address of current word. 2+ // Convert to Address of body. IP ! // Change IP NEXT // Execute new word ; : EXIT R< // Get old IP (saved by DOCOLON) IP ! // Restore NEXT // Resume execution. ; Since the most recent caller is always at the top of the return stack, the forth system can find it's way through any number of levels of nesting, no matter how deep. There is no theoretical limit to the depth of nesting of forth words, although there is the practical limit of the size of the return stack. So how deeply can one nest definitions in Blazin' Forth? Well, the obvious answer is around 123 levels, since there is an entire page of memory allocated for the return stack. It is equally obvious that certain actions can modify this, such as pushing literals to the return stack in your definitions, or using DO LOOPS, since DO LOOPS store the loop control information on the return stack. Less obviously, you should note that CODE definitions do not cause the above nesting to occur. The majority of the primitives in Blazin' Forth are CODE definitions, and the desire to maximize the level of nesting was one of the design considerations that led to this decision. In practice, I have never even approached the theoretical maximum level for nesting, much less had a crash that was traceable to return stack overflow, even when using highly recursive words. ---------------------------------------------------------------------------- ------------------------- Forth's DOES> construct -------------------------- ---------------------------------------------------------------------------- The implementation of the DOES> feature of Forth is usually one of the hardest for people to understand. The thing to remember when we get down to the actual details of the implementation is exactly how the current word pointer W works. When a word is executing, W will contain the execution address of that word. Stored at the address in W is the actual address of the code that is executing. In Forth, we would say that W @ is the execution address of the word, and W @ @ is the address of the code. Keeping this in mind will help you to understand what is going on. First, a quick refresher on what DOES> does. DOES> is possibly the most unique feature of forth, since it allows you to extend the actual Forth compiler to compile new types of words. DOES> words are compiler words, and as such, they are used to create new words to execute. To help keep the discussion clear, lets call words which contain DOES> parent words, and words which are created by DOES> words, child words. When a parent word executes, it creates a dictionary entry for the child. When the child executes, it leaves the address of it's body on the parameter stack, and then executes the hi-level forth words after DOES> in the parent word. A common way to teach beginners about DOES> is to redefine one of the Forth primitives, such as CONSTANT, as a DOES> word. I'll do the same thing here, but I will also try to explain exactly how these words do their thing on an implementation level. : CONSTANT CREATE , DOES> @ ; Here we have our CONSTANT definition. When CONSTANT (the parent) executes, it will create a dictionary entry with a standard header (that's the function of the CREATE in our definition). It then allocates two bytes of parameter space, and compiles the value on the top of the stack into the dictionary (that's the function of the , in our definition). The words following the DOES> don't do anything when CONSTANT executes - they execute when the child word (the word created by CONSTANT) executes. When the child executes, it will leave the address of it's body on the stack, and then the words following the DOES> will be executed. In this example, there is only the @ - which will replace the address of the BODY with the value stored there, just like CONSTANT should, and EXIT, which will return us to wherever we came from. Thus 10 CONSTANT TEN creates a dictionary entry for the name TEN, and a 2 byte parameter field for the value 10, which CONSTANT also stuffs there. Executing TEN will first leave the address of TEN's body on the stack, and then the words following DOES> (in the parent word CONSTANT) will execute, which result in the value 10 being left. Now for the implementation details: Here is how our definition of CONSTANT would look in the dictionary: (Preceeded, as always, with the standard forth header ) $A000 DOCOL ( code field portion of header) $A002 CREATE ( execution address ) $A004 , ( execution address ) $A006 (;CODE) ( execution address ) $A008 JMP DODOES ( actual machine language instructions) $A00B @ ( execution address) $A00D EXIT ( execution address) And here is how the definition of TEN would look: ( Standard dictionary header goes here) $B000 $A008 ( code field portion of header) $B002 10 ( value stored in parameter field) Ok, here is how it all sorts out. Remember that DOES> is defined as an IMMEDIATE word, and so it executes when you are compiling. The mysterious portion of the CONSTANT defintion, above - the (;CODE) and the JMP DODOES are written into the dictionary whenever DOES> executes. (;CODE) is an unusual primitive. When it executes, it overwrites the current contents of the code field of the last word added to the dictionary with the address of the machine code which follows it in the defintion currently executing. In our example above, it will cause all words created with CONSTANT to have a code field whose value is $A008 - the address of the JMP instruction in CONSTANT. This will obviously cause the JMP DODOES instruction to be executed each time a word created by CONSTANT is executed. DODOES is the routine that does the actual magic. It must do three things: 1) It must save the current value of the IP (just like DOCOLON) so Forth knows how to get back to the caller. 2) It must push the address of the child's body to the stack. 3) It must execute the words following the JMP DODOES in the parent. Using TEN as an example, DODOES must push the value $B002 to the parameter stack, and it must then cause the words starting at $A00B to be executed. Here is how it's done in Blazin' Forth: When TEN executes, it should be clear that the value stored in the current word pointer ( W ) is $B000, which is the execution address of TEN. The IP will be pointing somewhere important, so DODOES first saves it, which it does exactly like DOCOLON, by pushing it onto the return stack. Once the IP has been safely tucked away, we have two tasks to perform. We must push the address of the parameter field of TEN to the stack, and we must then cause the hi-level forth words in the DOES> part of CONSTANT to execute. We can use the value of W to do both these things. Remember that W, the current word pointer, is currently pointing at the execution address of TEN, and so contains $B000. So it is a simple matter to calculate the address of the body of 10 - we just add two to the current value of W ( which gives us $B002), and push it to the parameter stack. Now, the value stored at $B000 is $A008, which is the address of the JMP DODOES instruction in CONSTANT. We want to execute the hi-level forth words beyond this instruction - a piece of cake. We simply add 3 ( the size of an absolute JMP instruction on the 6502 ) to the value stored at the execution address of the child word TEN, and stuff it into the IP. Once this has been done, all we need to do is call NEXT, which takes care of everything else, since we just pointed the IP at the proper spot. Since we saved the previous value of the IP first off, when the EXIT at the end of the DOES> stuff is executed, we get returned to whatever called us. Once again, here is an example of DODOES in hi-level forth: : DODOES IP @ >R // Save current IP on return stack W @ 2+ // Leave address of parameter field on stack W @ @ // Get address of JMP DODOES instruction 3 + // Add in size of JMP absolute instruction IP ! // Set as new execution address NEXT // and execute it. ; That wasn't so hard, was it? ---------------------------------------------------------------------------- ------------------- LITERALS, CONSTANTS, and VARIABLES --------------------- ---------------------------------------------------------------------------- In this last section, I talk about how Blazin' Forth handles compiled literals, and how the words defined by CONSTANT, VARIABLE and USER are implemented. There are two kinds of literals recognized by Forth, numeric and string. Numeric literals are compiled automagically, by the compiler loop, while string literals are compiled by ." (usually). Numeric literals first. As you probably remember, when you compile a definition, Forth attempts to look up each word in the definition in the dictionary. If it finds the word, then it compiles the execution address of the word into the dictionary (unless the word is defined IMMEDIATE, of course!). If the word is not found, then it attempts to convert the string of characters you just fed it into a number. If it succeeds, then it compiles a special primitive called (LIT) into the dictionary, and immediately past that, it places the value of your literal. (If it can't convert it to a number, then it issues the famous "NOT IN CURRENT SEARCH ORDER" message.) Here is an example: : BIG 1000 ; and here is how it looks in the dictionary: (standard forth header) $A000 (LIT) ( start of BIG's BODY) $A002 1000 ( The value of your literal) $A004 EXIT ( EXIT - Tadah!) (LIT)'s function is to place the value following it in memory on the top of the parameter stack, and to move the IP over the literal value, to the next valid forth word. It's pretty simple in practice, if you remember that if (LIT) is being executed, the IP must be pointing at the address of the literal, since it was incremented by NEXT. Here it is as an example FORTH definition: : (LIT) ( -- 16bit) IP @ @ // Get value of literal to stack. 2 IP +! // Move IP past literal value, to next valid word. NEXT // and call NEXT ; Blazin' Forth has a memory saving feature for values that will fit in one byte. For these values another word, called CLIT is compiled, instead of (LIT). It works very similarly to (LIT): : CLIT ( --- byte-value) IP @ C@ // get the byte to the parameter stack 1 IP +! // move over byte literal to next valid word. NEXT // and execute next ; The case of string literals is very similar. ." is an immediate word which first compiles (.") . It then searches the input stream for an ending ", and moves everything before this final quote into the dictionary, with a leading count byte, as is normal for Forth. It also moves the pointers to the input stream past the string, so the interpreter won't try to evaluate it. Here is an example: : GREETING ." HELLO" ; And here is how GREETING would look in memory: (Header) $A000 (.") ( primitive to print the following inline string) $A002 5 ( the length of the string) $A003 H E L L O ( The characters are stored here, one per byte ) $A008 EXIT The (.") primitive is one of the few low level words in Blazin' Forth that is actually written in Forth (i.e. it's a colon definition). Since (.") is a colon definition, this means that when (.") is called, DOCOLON will save the current value of the IP on the return stack. But, by a pretty stroke of fate, this will be exactly the address of the string following (."). To get a little more concrete about it: When Greeting executes, the IP will eventually contain the value $A000. This will cause NEXT to execute (."), but NEXT will first, as always, bump the IP to $A002 (the start of the inline string). When (.") executes, since it is a colon definition, DOCOLON will push $A002 (the current IP) to the return stack, and then enter the definition. So at entry, we have the address of the string on the return stack. All we have to do is retrieve the address, use COUNT and TYPE to display it, and adjust the return address on the stack before we exit. Once the return address has been adjusted and placed back on the stack, EXIT will return us to the word past the end of the inline string. Here is (."), just as it appears in the Blazin' Forth: : (.") ( --- ) R@ ( get string address from return stack) COUNT ( get the count byte, adjust address ) DUP 1+ ( total length of string, including count byte) R> + ( get address, move past end of string) >R ( and restore, for EXIT) TYPE ( the string) ; ( and return, using adjusted address as return) So much for literals. Constants and variables (including USER variables) run time action is determined by the routines pointed to by their code fields. There is no special primitives compiled, as there is with the literals. Here is a short run down of the actions of each: Constants place the value stored in their body on the parameter stack. Variables place the address of their body on the parameter stack. User variables place the address of the associated variable on the stack. The actual value stored in the parameter field is an offset from a base address. Armed with your present knowledge of the IP and W, understanding these definitions should be a snap. They all work very much the same. We start with variable, since it's the simplest. When the code field of a variable (or constant or USER) is executed, W will contain the execution address (the address of the code field) of the word in question. So it's easy: take the value in W, add two, and leave that value on the stack. Here is a hi-level definition of DOVARIABLE: : DOVARIABLE ( -- address) W @ // Get the execution address of this variable 2+ // Add two to get the body. NEXT // and that's it! ; Constants are very similar to variables - the only difference is the extra step required to retrieve the value in the constants body. Here is a hi-level definition of DOCONSTANT: : DOCONSTANT ( -- value) W @ 2+ // As in variable - get the address of the body. @ // Get the value stored there. NEXT ; USER variables are very similar to constants. The only addition here is that we add the base address of the user area to the value stored in the body of the user variable. : DOUSER ( -- address ) W @ // get execution address of this user variable 2+ C@ // get offset - we only use byte offsets in Blazin' Forth. UP @ // get base address of user area + // add to offset to get actual address of variable NEXT ; Here is a question for those who want to test their general comprehension of the topics discussed here. Why can't we use the IP instead of W in the definitions of VARIABLE, CONSTANT, and USER ? Answer: Aside from making the definition more complex, it would be impossible to retrieve the addresses of variables, or the values of constants, when we are typing their names directly into the interpreter from the terminal! Remember that the interpreter launches programs by stuffing the execution address of a word into W. In the following situation, there is no way to get from the address of the IP to the address of the parameter field: VARIABLE BLETCH BLETCH . XXXX since the IP is still pointing somewhere inside INTERPRET. The only pointer that is valid to code such as DOVARIABLE in all cases is W.